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Abstract 

A sufficient condition for the uniqueness of multinomial sequential unbiased 
estimators is provided generalizing a classical result for binomial samples. 
Unbiased estimators are applied to infer the parameters of multidimensional 
or multinomial Random Walks which are observed until they reach a bound- 
ary. An application to clinical trials is presented. 
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1. Introduction 

In many applications stochastic processes are used to model the behav- 
ior of some phenomena up to the ffist crossing of a threshold. It is the 
case of neuronal modeling, population dynamics, ruin probabilities... (just 
to mention a few). Parametric inference is needed to calibrate such models 
in order to obtain good fits with experimental data and specific sequential 
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statistical methods are needed (cf. e.g. iBibbona and DitlevsenI ( I2OIOI )). In 
many cases Random Walks (RWs) might be used as toy models for such phe- 
nomena. In the special case where the increments are independent Bernoulli 
ran dom variables, t hen a classical result in binomial sequential estimation 
(cf. iGirshick et al 



( 119461 )) may be applied to find an unbiased estimator. In 
Savage! (jl947l ) (updating other references quoted therein) a sufficient condi- 
tion for the uniqueness of the unbiased estimator is found. If we have a RW on 
a higher dimensional lattice or any other RW whose increments have k possi- 
ble outcomes with probaWlities_2i ■ • • jO/,, a genera l izatio n of the above result 



still applies. Indeed in 



Koikd (Il993[ ) and iKremerd (Il990[ ) unbiased sequential 



estimation is extended to the multinomial context. In such suffi- 
cient condition for the uniqueness of the unbiased estimators is not available. 
The present letter fills this gap and presents a few examples where unbiased 
estimation is applied to multidimensional or multinomial boundary crossing 
RWs. An application of sequential estimation of the multinomial probabil- 



ities that deserv e a specia. 



clinical trials (cf. 



Zee et al 



attention is that following phase II multistage 



(Il999[ )) where patients are classified according to 



their respondence to a treatment. A short account of such application con- 



cludes t he paper. Further re 



found in 
plans, in 



Sinha and Sinhal ( 



Sinha et al. 



evant results related to the main topic can be 



, 3hat and Kulkarni (119661) regarding efficient multinomial sampling 



1992) for a review of the binomial case and in 



(120081 ) for generalizations to the quasi- binomial context. 
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2. Unbiased multinomial sequential estimation 

We consider a repeated experiment having k possible outcomes occurring 
with probabilities pi ■ ■ - pk- Denote by X„ = (x^, ■ ■ ■ , x^) the process whose 
components G N count how many occurrences of events of type i we had 
at the n-th (independent) repetition. The process X„ lives in the hyper-plane 
where the sum of the coordinates is n. Denoting by Sn C M'^ the portion of 
such plane where all the coordinates are positive or null and the set of 
points in Sn with natural coordinates, for any n we have X„ G S^- 

Let Xn be observed until it reaches the boundary B of an accessible region 
R G (we mean those points which are not in R but that might be reached 
in one step starting from R). 

For every point y E B with coordinates (?/i, ■ • • ,yk) let us denote by 
k{y) the number of paths in R that start at the origin and end in y and by 
k*{y) the number of those that end in y but start in the point whose i — th 
coordinate is 1 and the others are 0. The probability that the first hitting to 
the boundary occurs in y is 

ny) = k{y)pf---pf. (1) 

The region R is defined to be closed if J^yeB^iv) = 1- 



Theorem 2.1 (iKoikd (119931 )). For any closed region R, the ratios 



are unbiased estimators for the probabilities Pi. 

A sufficient condition on the region R for the estimator ([2]) to be the 
unique bounded unbiased estimator for the binomial (k=2) probability is 



given in ISavagd (119471 ). We are going to generalize it to the multinomial 
context. For any n the region i? G and its boundary B project onto S^^ 
defining the accessible points of order Rn = RCiS^, the inaccessible points 
5*^ — Rn and (among them) the boundary points Bn = B n S^. R is said to 
be a simple region if for any n the convex hull H{Rn) of i?„ does not contain 
inaccessible points. 

Theorem 2.2. If the region R cN'^ is simple and closed, the estimators ([2]) 
are the unique bounded unbiased estimators of the parameters Pi. 



We adapt the method in lSavagd (1194 Tl ). but we need the following Lemma 



(obvious when k = 2) that will be proved after the main theorem. 

Lemma 2.3. Let R be a simple region, and n an order such that in there 
are both accessible and boundary points. Among any collection of boundary 
points Cn C Bn it is always possible to choose a y & Cn and a {k — 2)- 
hyperplane TCg lying in the {k — l)-hyperplane that contains S'„ such that 

1. y eTTy 

2. TTy is identified by two linear equations 

L{x) = miXi + ■ ■ • + m^Xk = b 

(3) 

Xi + ■ ■ ■ + Xn = n 

where mj G N one vanishing and at least one non-vanishing and 6 G N. 

3. on Rn we have L{x) > 6 + 1 

4. at any other boundary point y G Cn, we have L{y) > b + 1 

Proof of Theorem \2.B. If the theorem were false we would have another un- 
biased estimator U of pi and the difference A = pi — IJ would be a non- 
identically vanishing unbiased estimate of zero. Since the first boundary 



point y hit by the process is a sufficient statistics (cf. iFergusonI (119671 ). Sec- 
tion 7.3, Lemma 1), we restrict to those estimators that are function of it 
and IE(A) = Xlj,eB ^{y)^{y) — 0- Let m be the smallest integer such that 
A is not vanishing at one element of B^. If i?^ = for such m t h en th e 
region R is finite and the thesis follows from Theorem 4 in iKremerd ( 119901 ) . 



If instead contains accessible points we apply Lemma [2731 to the collection 
Cm of boundary points y E Bm such that A{y) and find a point y and 
a linear combination L{x) = 7712X2 + ■ ■ ■ + TrikXk (for notational convenience 
we stipulate that the vanishing coefficient is the ffist one) with mj G N such 
that L{y) = b and that for any z G Cm^Rm we have L{z) > 6 + 1. A fortiori 
L{y) > 6 + 1 at any y in any i?„ with n > m since any such a y may only be 
reached evolving from an a; G Rm- For some positive A* we have 



\Aiy)\ k{y)p\ 



yi 



yk 
■Pk 



< 



(4) 



(L{y)>b+1 
■I A(y)^0 



E ^iy)ny) 

(L{y)>b+1 
I A(y)^0 

We are going to show that there are values of the parameters at which 
such inequality cannot hold. By construction any path from the origin to an 
y E B such that A{y) 7^ and L{y) > 6+1 either ends in Cm or crosses 
Rm- In Rm U Cm wc havc a finite number F of points ■ ■ ■ and there 
L(z*) > b + 1. For any y E B such that A{y) 7^ and L{y) > 6 + 1 we have 



¥{y) 



\Rm U CmMRm U Cm) = P(Z/ \Rm U Cm) k{z')pl' " " ' pf (5) 



s=l 



Let us now choose the parameters P2 - ■ -Pk in such a way that for some 
common factor < p < 1 we have pi = p™' for any i = 2 ■ ■ ■ k. We get 



F{y) <F{y\RmUCm) p''-'J2''^' 



and inequality (jl]) becomes 



^=1 „JL{y)>b+i ^=1 



that is always violated when p is small enough. □ 

Proof of Lemma \2.cl[ Existence of an y' and of a TCy/ satisfying conditions 
1. and 3. with rational coefficien t s in ([3]) is ensured by the Separating 



Hyperplane theorem (cf. iFergusonI (Il967l ). Sec. 2.7) and the density of 



in M. To get natural coefficients in it is then sufficient to multiply the 
first equation by a suitable integer and to add to it the second equation a 
sufficient number of times. Let us denote by L'{x) = b' the new equation of 
TTy/ meeting the first three conditions. Condition 4. may still not be fulfilled 
by TTy/. Let us denote by c < 6' the minimum value taken by L' on C„ and 
let us consider the plane tTc with first equation L'{x) = c. If it intersects 
Cn in one and only one point we have found both the point and the plane 
satisfying condition 4. If C„ fl tt^ contains more than one point, let us select 
one with the following algorithm. Start with the last coordinate x„ and select 
the points in C„ fl vTc where Xk is largest. Among them choose those at which 
Xk-i is largest and continue until the choice of the largest j-th coordinate 
singles out one and only one point |/ of C„ fl tTc- Now consider the plane Hy^r 
with first equation 

Lr{x) = L[x)- -Xi - —X2 rXk = C- - — 1/2 zVk = K- (6) 

Of course Hg^r still passes through y, and equation (|6]), once multiplied by 
r^, has integer coefficients. Moreover, since Rn is finite and since L{x)—b > 



6 



for any x G -R„, we can take r large enough to ensure both that Lr{x) — br > 
for every x G Rn and that the coefficients are natural The same argument 
apphes to the points in C„ — tTc. Moreover for any ?/ G C„, fl ttc we have 

Lr{y) -br = ^{yi - yi) + - 2/2) + ■ ■ ■ + ^(j/fc - yk) 
which is certainly positive due to the algorithm we used to select y. □ 

3. Examples 

In the following examples we derive the unbiased estimators for some mul- 
tidimensional or multinomial RWs observed up to the crossing of a boundary. 

3.1. RWs on a hidimensional lattice 

Let Wi be a RW on I? such that PFq = and Wi = Wi_i + li where the 
increments take the values (0,1), (1,0), (0,-1) and (-1,0) with probabilities 
Pi,P2,P3 and 1 — Yl'i=iPi- Let Wi be observed up to the first time its second 
component equals 6 > 0. The process X„ = (x^, ■ ■ ■ ,x^) whose components 
count how many occurrences of increments of type i we had at the n-th 
step of the RW is of the kind described in Section [2] and it is observed until 
it hits B = {x ^ : Xi — x^ = b} . The accessible region is closed whenever 
Pi > Ps > and simple. The maximum likelihood (ML) estimators of the Pi 
are X^^/N, while the unique unbiased estimators ([2]) are 

. b-l Xi ^ Xl ^ b+l Xl 



Pi = —T- ■ ITt 7' P2 = T7 7> P3 - 



N-r " A^-l' b N-1 



The trajectory count is based on the refiection principle (cf. iFellerl (Il97ll )). 

The results of a simulation study performed on 10.000 paths are shown 
in Table [TJ The performances of the two methods are not much different 



ML estimators 



Unbiased estimators 



mean 



sd 



m.s.e. 



mean 



std 



m.s.e. 



pi = 0.4 0.436 0.081 0.0078 0.400 0.080 0.0063 

P2 = 0.15 0.148 0.045 0.0020 0.150 0.046 0.0020 

P3 = 0.3 0.268 0.078 0.007 0.200 0.087 0.008 

Pi = 0.7 0.727 0.123 0.016 0.701 0.130 0.017 

P2 = 0.1 0.095 0.072 0.005 0.101 0.077 0.006 

P3 = 0.1 0.084 0.085 0.007 0.098 0.098 0.010 



Table 1: Results of inference on a simulated sample of RWs on a bidimensional lattice 
stopped as soon as their second component reaches the threshold value b = 10. 

and the best choice depends on the parameter range. When pi is close to p3 
some of the unbiased estimators have a smaller mean square error than the 
corresponding ML, while when pi is higher ML estimates are better. Let us 
remark that the estimates of parameters p2 and p4, in the direction on which 
the RW is not constrained, are estimated much better than the other two. 

3.2. A simple RW allowing for null steps 

Let Wi be a RW on Z such that Wq = and Wi = Wi-i + h where the 
increments are 1, or -1 with probabilities pi,p2 and 1 — Yld=iPi- ^^i^^ 
we count the increments by — (x^, ■ ■ ■ ,x^. Wi is, observed up to the 
first time it equals 6 > and Xi until Xi — X^ — b. The accessible region is 
simple and whenever pi > pa > also closed. ML estimators are again the 
sample proportions, and the unbiased ones are 



Pi = 



b-1 Xl, 



P2 = 



b N-1 



N-1 
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4. Sequential multinomial estimation and clinical trials 



ZeeetaL 



(I1999I)) a 



In a multinomial multistage phase II cancer trial (cf. 
group of patients is treated with a new drug and then classified as responders 
if tumor shrinkage is more than 50%, non-responders if it is less and early 
progressions if they undergo a progress in the disease. A decision is taken 
whether to stop the trial and conclude that the therapy is promising (or 
ineffective) if the responders are more (less) than a predetermined value and 
the early progressions are less (more) than another value. In the intermediate 
case when the number of respondent patients or of the early progressions is 
between the thresholds a new group of patients is enrolled and the trial 
continue to a next stage. Estimation of the probability of response and early 
progressions after such trials matters in practice. In the case of a binomial 
trial (patients are either responders o r non-responde r s) the presence of a 



Jung and Kinj (1200 



Zeeetal 



) and unbiased 
Jigggf ) was the 



bias from ML was already noticed in 
estimators were studied. The design proposed in 
following: let K be the maximum number of stages allowed and Ug for s = 
1 ■ ■ ■ K the number of patients enrolled in each stage. We denote by Ng = 
J2i<s number of patients involved up to the s-th stage. The process 

Xj = {rj,j — rj — ej,ej) counts the number of respondent, non-respondent 
and early progressions among the first j patients. For any j 7^ Ng the trial 
is continued, but when j = Ng for some s < K there are three options: 

1. the trial is stopped and the therapy considered promising ii r^^ > pf 
and ctVs < and such stopping region is denoted by B'^^ 

2. the trial is stopped and the therapy considered ineffective if r^r^ < 
and ctVs > and such stopping region is denoted by 



3. the trial is continued to stage s + 1 if < r^r^ < pf or ef < ctv^ < 
and such continuation region is denoted by Rn^- 

The trial ends at a random stage S < K with a final observation X^^ = 
(r, Ns — r — e,e). The probabilities of response and of an early progression 
can be estimated by means of the unbiased estimators ([2]) that are 

. . .y^ ( )( )■■■( '^■^ ) 



Pi{r,Ns - r - e, e) 
P3{r,Ns - r - e, e) 



Rn/-iRn2 '^RNg_i\rNi,yi,eNi-i' \r]^^,y2,ei^2J \rNg,ys,eNg' 

y^ y^ .y^ 7 "i VT YTTTT y 



where f " ) denotes the multinomial coefficient and the sums are per- 

\r,y,e/ r'.y'.e'. ^ 

formed over the triples (rAr., ejvj belonging to the continuation regions 
Rjy- with i < S. 

5. Conclusion 

The main result of the paper is to prove that simplicity of the accessible 
region i? is a sufficient condition to ensure the uniqueness of the unbiased 
estimators ([2]). Of course the availability (and the uniqueness) of unbiased 
estimators does not mean that they are the best way to estimate the pa- 
rameters and the simulation study performed on RWs in SecJS] shows that 
there are both parameter ranges where the unbiased estimators are superior 
than ML and vice-versa. The bias of the ML estimators, moreover, can be 
reduced as in 



Whitehead! (119861 ) or by bootstrapping and the best method to 
be used needs to be decided case by case. Multinomial clinical trials provide 
an important application of the method presented. 
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